AITopics | best-arm identification

Collaborating Authors

best-arm identification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Best of both worlds: Stochastic & adversarial best-arm identification

Abbasi-Yadkori, Yasin, Bartlett, Peter L., Gabillon, Victor, Malek, Alan, Valko, Michal

arXiv.org Machine LearningApr-17-2026

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.

bestofbothworld, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2604.1486

Country:

Oceania > Australia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

bd33f02c4e28615b5af2d24703e066d5-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 22:56:49 GMT

algorithm, optimal arm, probability, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
Europe > Germany > Brandenburg > Potsdam (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

MinimaxOptimalFixed-BudgetBestArm IdentificationinLinearBandits

Neural Information Processing SystemsFeb-8-2026, 21:58:04 GMT

Complementarily, we present a minimax lower bound for this problem.

artificial intelligence, bandit, data mining, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Add feedback

Best Arm Identification with LLM Judges and Limited Human

Ao, Ruicheng, Chen, Hongyu, Gao, Siyang, Li, Hanwei, Simchi-Levi, David

arXiv.org Machine LearningJan-30-2026

We study fixed-confidence best-arm identification (BAI) where a cheap but potentially biased proxy (e.g., LLM judge) is available for every sample, while an expensive ground-truth label can only be acquired selectively when using a human for auditing. Unlike classical multi-fidelity BAI, the proxy is biased (arm- and context-dependent) and ground truth is selectively observed. Consequently, standard multi-fidelity methods can mis-select the best arm, and uniform auditing, though accurate, wastes scarce resources and is inefficient. We prove that without bias correction and propensity adjustment, mis-selection probability may not vanish (even with unlimited proxy data). We then develop an estimator for the mean of each arm that combines proxy scores with inverse-propensity-weighted residuals and form anytime-valid confidence sequences for that estimator. Based on the estimator and confidence sequence, we propose an algorithm that adaptively selects and audits arms. The algorithm concentrates audits on unreliable contexts and close arms and we prove that a plug-in Neyman rule achieves near-oracle audit efficiency. Numerical experiments confirm the theoretical guarantees and demonstrate the superior empirical performance of the proposed algorithm.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2601.21471

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)

Add feedback

Optimal Multi-Fidelity Best-Arm Identification

Neural Information Processing SystemsDec-27-2025, 10:11:45 GMT

In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm. Our first contribution is a tight, instance-dependent lower bound on the cost complexity. The study of the optimization problem featured in the lower bound provides new insights to devise computationally efficient algorithms, and leads us to propose a gradient-based approach with asymptotically optimal cost complexity. We demonstrate the benefits of the new algorithm compared to existing methods in experiments. Our theoretical and empirical findings also shed light on an intriguing concept of optimal fidelity for each arm.

artificial intelligence, name change, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

Bandits with many optimal arms

Neural Information Processing SystemsDec-24-2025, 20:27:00 GMT

We consider a stochastic bandit problem with a possibly infinite number of arms. We write $p^*$ for the proportion of optimal arms and $\Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters $T$ (the budget), $p^*$ and $\Delta$. For the objective of minimizing the cumulative regret, we provide a lower bound of order $\Omega(\log(T)/(p^*\Delta))$ and a UCB-style algorithm with matching upper bound up to a factor of $\log(1/\Delta)$. Our algorithm needs $p^*$ to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to $p^*$ in this setting is impossible. For best-arm identification we also provide a lower bound of order $\Omega(\exp(-cT\Delta^2p^*))$ on the probability of outputting a sub-optimal arm where $c> 0$ is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order $\log(T)$ in the exponential, and that does not need $p^*$ or $\Delta$ as parameter. Our results apply directly to the three related problems of competing against the $j$-th best arm, identifying an $\epsilon$ good arm, and finding an arm with mean larger than a quantile of a known order.

delta, name change, optimal arm, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-3-2025, 05:56:49 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. In this problem there is a hidden unknown parameter \theta*\in R^d and a finite set of arms X\subseteq R^d. When an arm is pulled you observe a reward x^T\theta*+epsilon where epsilon is a zero mean i.i.d noise with bounded range. The goal is to identify argmax_{x\in X} x^T\theta* with the least number of samples. Results: The paper characterizes the sample complexity of static and dynamic allocation strategies to identify the best arm.

algorithm, complexity, identification, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback